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Abstract: This paper deals with the development of a reliability 

methodology to assess the consequences of using hardware, without failure 
analysis or corrective action, that has previously demonstrated that it 
did not perform per specification. The subject of this paper arose from 
the need to provide a detailed probabilistic analysis to calculate the 
change in probability of failures with respect to the base or non-failed 
hardware . 

The methodology used for the analysis is primarily based on 
principles of Monte Carlo simulation. The random variables in the 
analysis are: Maximum Time of Operation (MTO) , and 
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Operation Time of each Unit (OTU) . The failure of a unit is considered 
to happen if OTU is less than MTO for the Normal Operational Period (NOP) 
in which this unit is used. NOP as a whole uses a total of 4 units. Two 
cases are considered. In the first specialized scenario, the failure of 
any operation or system failure is considered to happen if any of the 
units used during the NOP fail. In the second specialized scenario, the 
failure of any operation or system failure is considered to happen only 
if any two of the units used during the NOP fail together. The 
probability of failure of the units and the system as a whole is 
determined for 3 kinds of systems - Perfect System, Imperfect System 1 
and imperfect System 2. In a Perfect System, the operation time of the 
failed unit is the same as that of the MTO. In an Imperfect System 1, 
the operation time of the failed unit is assumed as 1% of the MTO. In 
„ imperfect System 2, the operation time of the failed unit is assumed 
as zero. In addition, simulated operation time of failed units is 
assumed as 10% of the corresponding units before zero value. Monte Carlo 
simulation analysis is used for this study. Necessary software has been 
developed as part of this study to perform the reliability calculations. 

The results of the analysis showed that the predicted change in 
failure probability (P r ) for the previously failed units is as high as 49% 
above the baseline (perfect system) for the worst case. The predicted 
change in system P F for the previously failed units is as high as 36% for 
single unit failure without any redundancy. For redundant systems, with 
dual unit failure, the predicted change in P F for the previously failed 
units is as high as 16%. These results will help management to make 
decisions regarding the conseguences of using previously failed units 
without adequate failure analysis or corrective action. 



INTRODUCTION 


The subject of this paper arose from a situation experienced under 
operational conditions. A hardware unit failed to perform per 
specification under certain cryogenic conditions. When the unit was 
removed from service and ground tested, the unit also failed to operate 
per specifications during a specific temperature range. Similarly, in 
another operation situation, a unit failed to operate under similar 
circumstances. To duplicate the operational scenario, cryogenic testing 
was performed and both units failed to close in the temperature range of 
-60° to -80° F. But, both units would actuate if the energization switch 
were held (energized) for a long period of time rather than actuated 
momentarily. The units operated nominally under room temperature 
conditions and at cryogenic temperature conditions above and below the 
-60° to -80° F range. In addition, another similar unit failed to 


operate in the -60° to -80° F temperature range during an acceptance test 
procedure (ATP) following manufacturing. In all, three units failed to 
operate nominally in a narrow temperature range but did operate when 
energized longer than normal. One option considered was to return all 
three units for operational use since the range in which failures had 
been experienced were very narrow. This option necessitated the 
development of a reliability methodology to assess the consequences of 
operating with known failed hardware such as the ones discussed above. 
It is the conclusion of this study that these units have a high 
probability of failing again without adequate failure analysis or 
corrective action. The reason being that continuous energization of the 


unit does not constitute an effective workaround for the non-conformance 
of these units. The object of this paper is to define the methodology 
developed and used to calculate the change in probability of failures 
with respect to the base or non-failed hardware for the detailed 
probabilistic analysis. This work is an extension of the previous work 



by Mikula, et al . 


[1] dealing with single unit failure. 



brief literature review 

Principles of probabilistic analysis have been used extensively for 
solution of practical problems for the last two decades. Freudenthal 
[ 2 ], Cornell [3], Hasofer and Lind [4], and Ang [5] have done fundamental 
work in this direction. Freudenthal [2] mainly discussed the safety 
aspect of a member subjected to variable random load. Cornell [3] dealt 
with the concept of a code, which is probability-based instead of the 
traditional deterministic code. Hasofer and Lind [4] defined the 
reliability index as the shortest distance to the failure surface. Ang 
[5] mainly dealt with the structural risk analysis aspects using the 
reliability basis. There have been many applications of these 
fundamental concepts to various practical problems. Some of the 
noteworthy applications are: Ravindra and Galambos [6], Rackwitz and 

Fiessler [7], Ellyin and Putcha [8], MacGregor, et al. [9], Putcha [10], 
Ellingwood, et al . [11], and Ayyub and Haidar [12] to name a few. Much 

of the work has been reported in the literature both in the area of 
fundamental applications of reliability concepts as well as in the 
applied field [13-15]. The reader is advised to refer to the references 
for a summary of the extensive literature review conducted. 

RANDOM variable identification 

The discussion of the methodology used for the probabilistic 
analysis is provided in the next section. First, the random variables 
in the problem are identified. They are: Maximum Time of Operation 

(MTO) and Operation Time of each Unit (OTU) . The collected data for MTO 
(in hours) is given in Table 1. There are four units associated with 
each of two Normal Operational Periods (NOP) , two units were supposed to 
have failed to operate during normal use in two different systems 
(defined herein as Operation 1 and Operation 2). The six other unfailed 
units are available for study. Hence, OTU data for all the above 
mentioned units (failed as well as unfailed units) is collected from 



history documentation prepared by Mikula [16]. Probabilistic analysis 
is done in this study for two kinds of data. One set of data is 
classified a "SPECIFIC DATA" which deals with OTU associated with the six 
unfailed units. This data is shown in Table 2. The other kind of data 
is classified as "ALL DATA" in which the OTU of all units is collected. 
This data is shown in Table 3. For the failed units the OTU data is 
classified under three categories, namely, Perfect System (PS), Imperfect 
System 1 (IPS-1) , and Imperfect System 2 (IPS-2) . Before discussing the 
data set for each of these systems for probabilistic analysis, some 
explanation regarding two of the failed units is necessary. 

Both were used successfully for some period of time. Hence, the 
data for these units constitutes a mixture of the OTU of these units in 
operations where the OTU is assumed as equal to MTO of that NOP (the no 
failure times), along with OTO of these units in periods where they are 
supposed to have failed. It is in the later part that a distinction is 

made between the three types of systems. 

The following discussion relates to units 1 through 4 (Table 2) 
associated with Operation 1. For a perfect System the total data for 
probabilistic analysis consists of the OTU for unfailed units 1. 2, and 

3 (assumed as equal to MTO of Operation 1) and the OTU for the failed 
unit 4. As previously indicated, this includes data for the unfailed unit 

4 for other operations along with data for failed unit 4 corresponding 
to operation 1 (assumed also equal to the corresponding MTO) . For an 
imperfect System 1 (IPS-1), the total data for probabilistic analysts 
consists of the OTU for unfailed units 1, 2, and 3 (again assumed as 
equal to the MTO of Operation 1) and the OTU for failed unit 4. The 
latter part includes the data for unfailed unit 4 along with data for 
failed unit 4 corresponding to the operation in consideration which is 
assumed as 1% of the corresponding MTO. For an Imperfect System 2 (IPS- 
2). the total data for probabilistic analysis consists of the OTU for 



unfailed units 1, 2, and 3 (again assumed as equal to the MTO of 
Operation 1) and the OTU of failed unit 4. The latter part includes the 
data for unfailed unit 4 along with the data for failed unit 4 
corresponding to the special operation in consideration which is assumed 
as zero and the simulated OTU values for unit 4. It is to be noted that 
no data exists regarding the OTU values of the failed unit 4 after its 
use. So, for this analysis it is assumed that the simulated OUT values 
for unit 4 after its failure are 10% of the corresponding OUT values 
before its failure. The same discussion regarding Perfect System, 
Imperfect System 1 and Imperfect System 2 also applies to OTU values of 
unfailed units 5, 6, and 7 and failed unit 8 associated with another 

operation (Operation 2). The MTO values dealing with case of "SPECIFIC 
DATA" associated with Operations 1 and 2 are tabulated in Tables 4 and 
5, respectively. These will be used in conjunction with Table 2 data of 
OTU values while the MTO values tabulated in Table 1 for case of "ALL 
DATA” will be used in conjunction with Table 3 data of OTU values . 
METHODOLOGY 

The basic methodology used for probabilistic analysis is that of 
Monte Carlo simulation. This method is well discussed in the literature 
[17, 18] . As is pointed out in the previous section, the random 

variables relate to OTU of various units and MTO. Two kinds of 
distributions are assumed for random variables - normal and uniform. Two 
kinds of failures are discussed - component failure and system failure. 

Both of these failures are discussed below. 

Component Failure 

The basic equation is given below: 

P F = P (OTU < MTO) (1) 

Where, P (--) = probability of the event under consideration 

OTU = operation time of unit under consideration 
MTO = maximum time of operation 



Monte Carlo simulation is used for evaluation of failure 
probabilities. Units 4 and 8 are considered for evaluation of component 
failure as these are supposed to have failed during Operation 1 and 
Operation 2, respectively. The data for "SPECIFIC DATA" (Tables 2, 4, 

and 5) and "ALL DATA" (Tables 1 and 3) are used for evaluation of 
probability of failure values for 

units 4 and 8. All three systems - Perfect System, Imperfect System 1 
and Imperfect System 2 — data are used for evaluation of P F values as 
discussed in the previous section. 

System Failure 

The failure of any unit (defined by Equation (1)) is assumed to 
result in a potential loss of system. Since a system consists of four 
units, this would imply that the failure of any unit results in the 
failure of the system itself for single unit failure or when there is no 
redundancy in the system. Expressing mathematically [17, 18], 

(P ) = P [ { (OTU) /1<MT0} U { (OTU) /4<MT0> 

U { (OTU) /2<MT0} U { (OTU) /3<MT0}] (2) 

(P ) = P [ { (OTU) /5<MT0} U { (OTU) /6<MT0> 

U { (OTU) /8<MT0> U { (OTU) /7<MT0}] (3) 

For a dual unit failure (assuming redundancy in the system) the 
mathematical relation for probability of failure of the system can be 
expressed as : 

(P ) = P [ { (OTU) ,<MTO) { (OTU) <MTO) 

U { (OTU) ^MTO) { (OTU) 3 <MTO) 

U { (OTU) 1 <MTO} {(OTU) 4 <MTO) 

U {(OTU) 2 <MTO) {(OTU),<MTO) 

U {(OTU) 2 <MTO) { (OTU) s <MTO} 

U {(OTU) 3 <MTO) { (OTU) 4 <MTO}] (4) 

(P ) = P [ { (OTU) <MTO) { ( OTU) <MTO) 

U {(OTU) s <MTO) { (OTU) 7 <MTO) 



U {(OTU) 5 <MTO) {(OTU) 8 <MTO} 
U { ( OTU ) 6 <MTO } { ( OTU ) ,<MTO } 



U { (OTU) 6 <MTO) {(OTU) 8 <MTO} 

U {(OTU) 7 <MTO} { (OTU) 9 <MTO>] (5) 

Monte Carlo simulation is used to calculate the probability of 
failure of Operation 1 and Operation 2 using the pertinent random values 
of OTU of various units. As can be seen from Equation (2) and (4), the 
calculation of (P F ) op et , tion i incorporates the OTU values of 1, 2, 3, and 4. 

Similarly, it can be seen from Equations (3) and (5) the calculation of 
(P ) incorporates the OTU values of 5, 6, 7, and 8. Again, as in the 
case of component failures, the P p of the system is calculated using the 
data for "SPECIFIC DATA" and "ALL DATA". 

RESULTS AND DISCUSSION 
Single Unit Failure 

The results of reliability analysis with the assumptions of normal 
distribution for various random variables are tabulated in Table 6 for 
single unit failure. Table 7 shows similar results with the assumption 
of uniform distribution for random variables. As can be seen from the 
results, the predicted change in P F for the previously failed units is as 
high as 49% above the base line (Perfect System) for the worst case, with 
the assumption that the random variables follow either normal or uniform 
distribution. Regarding the percent change in system probability of 
failure, it was found that the maximum value is as high as 36% measured 
with respect to Perfect System as base from the results of both normal 
and uniform distribution. 

Dual Unit Failure 

The results of reliability analysis for dual unit failure are 
tabulated in Tables 8 and 9 with the assumption of normal distribution 
and uniform distribution, respectively. The predicted change in P F for 
the previously failed units is at the same level as single unit failure 
for both types of distribution. As expected, the change in system P F has 
reduced considerably (with highest value of 16%) due to consideration of 



dual unit failure. 



CONCLUSIONS 


A methodology has been developed in this paper for evaluating the 
probability of failures of previously failed units as well as the system 
itself, which uses these units. It has been found that the probability 
of failures increases significantly if failed units are returned to stock 
for use in future operations of the system without corrective action. 

Hence, units require corrective action to correct these types of 
failures before reusing them in the system. 
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TABLE 1 MAXIMUM TIME OF OPERATION (MTO) OF VARIOUS OPERATIONS 

















































































































































































































































































































































TABLE 4 MAXIMUM TIME OF OPERATION (MTO) 
(OPERATION 1 - "SPECIFIC DATA") 










































TABLE 5 MAXIMUM TIME OF OPERATION (MTO) 
(OPERATION 2 - "SPECIFIC DATA") 
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TABLE 6 PROBABILITY OF FAILURES FOR VARIOUS SYSTEMS 

(NORMAL DISTRIBUTION) 

Single Unit Failure 



Data Used 
For 

Operation 


Base 
Value P F 
of Unit 

Change in P F from 
Base Value P F of Unit 

Base 
Value P F 
of System 

Change in System* 
P F from Base Value 
P F of System 

Perfect 

System 

Imperfect 
System 1 

Imperfect 
System 2 

Perfect 

System 

Imperfect 
System 1 

Imperfect 
System 2 



Operation 1 


17.4% 34.0% 49.0% 41.4% 25.0% 


4 

ALL 

46.9% 

10.0% 

29.0% 

64.4% 

9.0% 

18.0% 

8 

Operation 2 

56.7% 

37.0% 

41.0% 

68.8% 

26.0% 

29.0% 

8 

ALL 

61.8% 

13.0% 

25.0% 

81.1% 

4.0% 

11.0% 


♦System for Unit 4 Includes Units 1 through 4 for Operation 1 
System for Unit 8 Includes Units 5 through 8 for Operation 2 
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TABLE 7 PROBABILITY OF FAILURES FOR VARIOUS SYSTEMS 
(UNIFORM DISTRIBUTION) 


Single Unit Failure 


Data Used 
For 

Operation 


Base 
Value P F 
of Unit 

Change in P F from 
Base Value P F of Unit 

Base 
Value P F 
of System 

Change in System 1 * 1 
P F from Base Value 
P F of System 

Perfect 

System 

Imperfect 
System 1 

Imperfect 
System 2 

Perfect 

System 

Imperfect 
System 1 

Imperfect 
System 2 


Operation 1 

19.5% 

48.0% 

49.0% 

53.2% 

26.0% 

ALL 

62.5% 

17.0% 

17.0% 

75.2% 

11.0% 

Operation 2 

52.6% 

47.0% 

47.0% 

63.6% 

36.0% 

ALL 

52.6% 

34.0% 

34.0% 

73.9% 

15.0% 


26.0% 


36. 


16.0% 


♦System for Unit 4 Includes Units 1 through 4 for Operation 1 
System for Unit 8 Includes Units 5 through 8 for Operation 2 
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TABLE 8 PROBABILITY OF FAILURES FOR VARIOUS SYSTEMS 
(NORMAL DISTRIBUTION) 


Dual Unit Failure 


Data Used 
For 

Operation 


Base 
Value P F 
of Unit 

Change in P F from 
Base Value P F of Unit 

Base 
Value P F 
of System 

Change in System* 
P F from Base Value 
P F of System 

Perfect 

System 

Imperfect 
System 1 

Imperfect 
System 2 

Perfect 

System 

Imperfect 
System 1 

Imperfect 
System 2 


Operation 1 

17.4% 

34.0% 

49.0% 

20.4% 

8.0% 

12.0% 

ALL 

46.9% 

10.0% 

29.0% 

48.8% 

3.0% 

8.0% 

Operation 2 

56.7% 

37.0% 

41.0% 

29.5% 

9.0% 

10.0% 

ALL 

61.8% 

13.0% 

25.0% 

55.3% 

5.0% 

9.0% 


♦System for Unit 4 Includes Units 1 through 4 for Operation 1 
System for Unit 8 Includes Units 5 through 8 for Operation 2 










































TABLE 9 PROBABILITY OF FAILURES FOR VARIOUS SYSTEMS 

(UNIFORM DISTRIBUTION) 

Dual Unit Failure 


Base Change in P F from Base Change in System* 

Value P F Base Value P F of Unit Value P F P F from Base Value 
of Unit of System P F of System 


Data Used 

Unit For Perfect Imperfect Imperfect Perfect Imperfect Imperfect 

No. Operation System System 1 System 2 System System 1 System 2 



4 

Operation 1 

19.5% 

48.0% 

49.0% 

26.5% 

16.0% 

16.0% 

4 

ALL 

62.5% 

17.0% 

17.0% 

64.7% 

4.0% 

4.0% 

8 

Operation 2 

52.6% 

47.0% 

47.0% 

35.8% 

11.0% 

11.0% 

8 

ALL 

52.6% 

34.0% 

34.0% 

55.1% 

10.0% 

10.0% 


♦System for Unit 4 Includes Units 1 through 4 for Operation 1 
System for Unit 8 Includes Units 5 through 8 for Operation 2 











































